46 research outputs found

    Evaluation and optimization of frequent association rule based classification

    Get PDF
    Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

    Irrelevant feature and rule removal for structural associative classification

    Get PDF
    In the classification task, the presence of irrelevant features can significantly degrade the performance of classification algorithms,in terms of additional processing time, more complex models and the likelihood that the models have poor generalization power due to the over fitting problem.Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or not useful for the application in question.Removing rules comprised of irrelevant features can significantly improve the overall performance.In this paper, we explore and compare the use of a feature selection measure to filter out unnecessary and irrelevant features/attributes prior to association rules generation.The experiments are performed using a number of real-world datasets that represent diverse characteristics of data items.Empirical results confirm that by utilizing feature subset selection prior to association rule generation, a large number of rules with irrelevant features can be eliminated.More importantly, the results reveal that removing rules that hold irrelevant features improve the accuracy rate and capability to retain the rule coverage rate of structural associative association

    A System Dynamic Simulation Model for Managing the Human Error in Power tools Industries.

    Get PDF
    In the era of modern and competitive life of today, every organization will face the situations in which the work does not proceed as planned when there is problems occur in which it had to be delay. However, human error is often cited as the culprit. The error that made by the employees would cause them have to spend additional time to identify and check for the error which in turn could affect the normal operations of the company as well as the companys reputation. Employee is a key element of the organization in running all of the activities of organization. Hence, work performance of the employees is a crucial factor in organizational success. The purpose of this study is to identify the factors that cause the increasing errors make by employees in the organization by using system dynamics approach. The broadly defined targets in this study are employees in the Regional Material Field team from purchasing department in power tools industries. Questionnaires were distributed to the respondents to obtain their perceptions on the root cause of errors make by employees in the company. The system dynamics model was developed to simulate the factor of the increasing errors make by employees and its impact. The findings of this study showed that the increasing of error make by employees was generally caused by the factors of workload, work capacity, job stress, motivation and performance of employees. However, this problem could be solve by increased the number of employees in the organization

    Assessing stakeholder’s credit risk using data mining in construction project

    Get PDF
    Nowadays, the rapid growth of national and global economic demands an efficient,innovative and cost effective for building and infrastructure project. Partnering in construction projects are complex in nature due to human and non-human factors variable.For instance, credit capacity is a common attribute from client’s perspectives when selecting partners in their construction project. However, the assessment of the credit risk capacity of partners (such as project manager, quantity surveyor, consultant, and contractor) is neglected particularly involving design build projects in Malaysia.Due to unforeseen risk associated to credit capacity, project delay and cost overrun occur frequently in Malaysian construction industry.Thus, this research aims to develop a framework for accessing credit risk using data mining for design build project. This study will employ case study approach in order to gather information, develop data mining model and validation with real case projects involving public clients.The framework will enable public client to select appropriate partners for their construction project with minimal risk. It is anticipated that this study will yield an efficient artifact to improve the existing government procurement system such as ePerolehan and e-Perunding

    Constructing a customer satisfaction model for a utility service industry using partial least squares approach

    Get PDF
    The purpose of this research is to explore the effect of Image, Customer Expectation, Perceived Quality and Perceived Value on Customer Satisfaction, and to investigate the effect of Image and Customer Satisfaction on Customer Loyalty of mobile phone provider in Malaysia. The result of this research is based on data gathered online from international students in one of the public university in Malaysia.Partial Least Squares Structural Equation Modeling (PLS-SEM) has been used to analyze the data that have been collected from the international students’ perceptions.The results found that Image and Perceived Quality have significant impact on Customer Satisfaction.Image and Customer Satisfaction ware also found to have significantly related to Customer Loyalty.However, no significant impact has been found between Customer Expectation with Customer Satisfaction, Perceived Value with Customer Satisfaction, and Customer Expectation with Perceived Value.It is hoped that the findings may assist the mobile phone provider in production and promotion of their services

    A conceptual framework for predicting the effects of encroachment on magnitude of flood in Foma-river area, Kwara State, Nigeria using data mining

    Get PDF
    Flooding occurs but there is no flood hazard. It is only after human encroachment into the floodplain that turn into hazard. The practice of continuous increase of properties development along the floodplain and indiscriminate refuse disposal into water channels have been a major constituting factors to intensive flooding along the floodplain. This is as a result of decline in the capacity of floodplain to absorbs excess flooding, thus resulting to exposing more urban areas to be vulnerable to flood. Foma-river is located in Ilorin Kwara, Nigeria on latitude N08,49574 and longitude E004,5107. Climate of Ilorin comprises of the dry and wet seasons with the wet season starting around March and lasting for about four to five months. This study intends to propose a conceptual framework to support the prediction of effects of season on magnitude of flood in Foma-river area using data mining approach based on 7 years sampled data from Nigeria Meteorological Agency (NIMET), and questionnaire responses from residents along Foma-river floodplai

    An innovative data mining and dashboard system for monitoring of Malaysian dengue trends

    Get PDF
    Monitoring dengue fever become an important task in reducing dengue outbreaks crisis. These monitoring tasks offered the stakeholder such as the Ministry of Health Malaysia (MOH) well informed status of the dengue fever. There are abundant dengue cases reported in Malaysia including mortality recorded over the past year. Data from Malaysian Open Data portal reveals, 21,900 cases of dengue fever were reported in 2012 with 35 deaths.However, this information are dispersed and circulated among several ministry and stakeholder.As such, information regarding the dengue outbreak belongs to MOH, while the information of population and density belong to another stakeholder.Putting this information into one monitoring system required an innovative system that capable to extract many data and information from several databases and capable to summarize these data into meaningful information. Knowing the dangerous effect of dengue fever, thus one of the solutions is to implement an innovative forecasting and dashboard system of dengue spread in Malaysia, with emphasize on an early prediction of dengue outbreak.Importantly, this research will deliver the message to health policy makers such as The Ministry of Health Malaysia (MOH), practitioners, and researchers of the importance to integrate their collaboration in exploring the potential strategies in order to reduce the future burden of the increase in dengue transmission cases in Malaysia

    Customer satisfaction model for mobile phone service providers in Malaysia

    Get PDF
    This paper presents the investigation on the effect of image, customer expectation, perceived quality and perceived value on customer satisfaction of mobile phone providers in Malaysia.Then, exploration on the effect of image and customer satisfaction on customer loyalty is also described.Data is gathered through online questionnaire distributed to international students in a selected public university in Malaysia. Partial Least Squares Structural Equation Modeling (PLS-SEM) has been used to analyze the data.The results found that image and perceived quality have significant impact on customer satisfaction with the regression coefficient values of 0.398 and 0.382 respectively.Image and customer satisfaction were also found to have significantly related to customer loyalty with the regression coefficient values of 0.378 and 0.409 respectively. On the other hand, there is no significant impact found between customer expectation and customer satisfaction, perceived value with customer satisfaction, and customer expectation with perceived value

    A study of graduate on time (GOT) for Ph.D students using decision tree model

    Get PDF
    Over the years, there has been exponential growth in the number of Doctor of Philosophy (Ph.D) graduates in most of the universities all around the world. The increment of Ph.D students causes both university and government bodies concern about the capability of the Ph.D students to accomplish the mission of Graduate on Time (GOT) that is stipulated by the university. Therefore, this study aims to classify the Ph.D students into the group of “GOT achiever” and “non-GOT achiever” by using decision tree models. Historical data that related to all Ph.D students in a public university in Malaysia has been obtained directly from the database of Graduate Academic Information System (GAIS) in order to develop and compare the performance of decision tree models (Chi-square algorithm, Gini index algorithm, Entropy algorithm and an interactive decision tree). The result gained in four decision tree models illustrated that the attributes of English background, gender and the Ph.D students’entry Cumulative Grade Point Average (CGPA) result are the core in impacting the students’ success. Among all models, decision tree model with Entropy algorithm perform the best by scoring the highest accuracy rate (72%) and sensitivity rate (95%). Therefore, it has been selected as the best model for predicting the ability of the Ph.D students in achieving GOT. The outcome can certainly ease the burden of universities in handling and controlling the GOT issue. Also, the model can be used by the university to uncover the restriction in this issue so that better plans can be carried out to boost the number of GOT achiever in future

    Evaluation of machine learning classifiers in faulty die prediction to maximize cost scrapping avoidance and assembly test capacity savings in semiconductor integrated circuit (IC) manufacturing

    Get PDF
    Semiconductor manufacturing is a complex and expensive process. The semiconductor packaging trending towards for more complex package with higher performance and lower power consumption. The silicon die is manufactured using smaller fab process technology node and packaging technology is using more complex and expensive packaging. The semiconductor packaging trend has evolved from single die packaging to multi die packaging. The multi die packaging requires more processing steps and tools in assembly process as well. All these factors cause cost per unit to increase. With this multi die packaging, it results higher loss in production yield compared to single die packaging because overall yield now is a function of multiplication of yield for each individual die. If any die from the final package tested at Class and found to be faulty not meeting the product specification, even the rest of die still passing the tests, the whole package will still be scrapped. This resulting in wasted good raw material (good die and good substrate) and manufacturing capacity used to assemble and test affected bad package. In this research work, a new framework is proposed for model training and evaluation for the machine learning application in semiconductor test with objective to screen bad die using machine learning before die attachment to package. The model training flow will have 2 classifier groupings which are control group and auto machine learning (ML) where feature selection with redundancy elimination method to be applied on input data to reduce the number of variables to minimum prior modeling flow. The control group will serve as reference. The other group, will use auto machine learning (ML) to run multiple classifiers automatically and only top 3 to be selected for next step. The performance metric used is recall rate at specified precision from ROI breakeven point. The threshold probability that correspond to fixed precision will be set as the classifier threshold during model evaluation on unseen datasets. The model evaluation flow will use 3 different non-overlapped datasets and comparison of classifiers will be based on recall rate and precision rate. This new framework will be able to provide range of possible recall rate from minimum to maximum, to identify which classifier algorithm performs the best for given dataset. The selected model can be implemented into actual manufacturing flow to screen predicted bad die for maximum cost scrapping avoidance and capacity savings
    corecore